Skip to content

Conversation

@Mukulyadav2004
Copy link
Contributor

fixes #1916
Implements row name extraction from named vectors in data.table() and as.data.table() calls when keep.rownames=TRUE or keep.rownames="column_name". This matches the behavior of data.frame() and as.data.frame.list() by extracting names from the first named atomic vector in the input.

Problem : Currently, data.table() and as.data.table() do not extract row names from named vectors like data.frame() .

So added logic to as.data.table.list() (the common path for both data.table() and as.data.table()) to:

  • Iterate through input vectors to locate the first atomic vector with valid names.
  • Store the names and remove them from the source vector to avoid duplication.
  • Prepend row names as a new column with name "rn" or custom name.

@MichaelChirico @tdhock can you please review.

@github-actions
Copy link

github-actions bot commented Jul 7, 2025

  • HEAD=issue_1916 stopped early for DT[by,verbose=TRUE] improved in #6296
    Comparison Plot

Generated via commit 6dbc414

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 37 seconds
Installing different package versions 38 seconds
Running and plotting the test cases 2 minutes and 30 seconds

@codecov
Copy link

codecov bot commented Jul 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.50%. Comparing base (ed2df98) to head (89e9ef4).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7136   +/-   ##
=======================================
  Coverage   98.50%   98.50%           
=======================================
  Files          81       81           
  Lines       15016    15032   +16     
=======================================
+ Hits        14792    14808   +16     
  Misses        224      224           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MichaelChirico
Copy link
Member

Hey @Mukulyadav2004, please ping when tests are passing, or if you're stuck & need an extra pair of eyes. Thanks!

@Mukulyadav2004
Copy link
Contributor Author

Thanks @MichaelChirico ! These both tests are passing when I run on Rstudio but can't figure out why they are failing here.

if (any(nzchar(valid_names))) {
vector_rownames = valid_names
x[[i]] = unname(xi)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaning towards merging this logic into the below loop. We can write check_rownames = !isFALSE(keep.rownames) and then the marginal cost of checking if (check_rownames && ...) is low. We can replace the early break with checking is.null(vector_rownames). WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

x <- c(1, 2, 3)
y <- setNames(c(4, 5, 6), c("A", "B", "C"))
test(2329.1, as.data.table(list(x, y), keep.rownames=TRUE), data.table(rn=c("A", "B", "C"), V1=c(1, 2, 3), V2=c(4, 5, 6)))
test(2329.2, as.data.table(list(x, y), keep.rownames="custom"), data.table(custom=c("A", "B", "C"), V1=c(1, 2, 3), V2=c(4, 5, 6)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's make the test suite more extensive:

  • Test also list(y, x)
  • Test the behavior under data.frame(), not just as.data.table
  • Test your condition about any(nzchar(valid_names))

I also don't think we've matched data.frame behavior yet, c.f.

DF = data.frame(row.names = letters, V = 1:26)
head(rownames(data.frame(a = 26:1, DF)))
# [1] "a" "b" "c" "d" "e" "f"

@MichaelChirico
Copy link
Member

@MichaelChirico @tdhock can you please review.

You were affected by an underlying change in master, see #7149 -- sorry about that.

@MichaelChirico
Copy link
Member

@Mukulyadav2004 I'm curious about the any(nzchar(nm)) condition -- where did it come from?

Observe that we already can produce ""-only rn column from a matrix today:

M = cbind(1:3)
rownames(M) = rep("", 3L)
as.data.table(M, keep.rownames='blank')
#    blank V1
# 1:        1
# 2:        2
# 3:        3

Maybe we should drop that condition here too?

@Mukulyadav2004
Copy link
Contributor Author

Mukulyadav2004 commented Jul 9, 2025

I included the any(nzchar(nm)) condition because I was trying to mimic the behavior of base R's data.frame() function, which typically only uses non-empty names as row names.
However, you're right we should preserve empty string rownames.

@MichaelChirico
Copy link
Member

Thanks! The data.frame behavior makes sense, but yes, let's break from that. consistency within the data.table API is more important.

@Mukulyadav2004
Copy link
Contributor Author

Mukulyadav2004 commented Jul 9, 2025

learned a lot from your changes, thank you.

test(2330.5, as.data.table(data.frame(y, x), keep.rownames=TRUE), data.table(rn=c("A", "B", "C"), y=4:6, x=1:3))

DF <- data.frame(row.names = letters[1:6], V = 1:6) # Test data.frame with explicit rownames
test(2330.6, as.data.table(list(a = 6:1, DF), keep.rownames=TRUE), data.table(rn=letters[1:6], a=6:1, V=1:6))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than remove the rest with all-empty names, let's test the expected behavior in that case as well.

Please also add a test of list(M) for empty-rowname'd matrix input

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's basically ready to go, please don't forget a NEWS entry and updated manual description

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you! I'm going to start using this right away!!

@MichaelChirico MichaelChirico merged commit bfa049c into master Jul 10, 2025
9 of 10 checks passed
@MichaelChirico MichaelChirico deleted the issue_1916 branch July 10, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data.table(keep.rownames = TRUE) should preserve names from vectors

2 participants